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Abstract-  A  data  set  consisting  of  North  Atlantic  right 
whale  ( Eubalaena  glacialis)  vocalizations  were  provided  as  part 
of  the  2003  International  Workshop  on  Detection  and 
Localization  of  Marine  Mammals  using  Passive  Acoustics  in 
Halifax,  Nova  Scotia.  These  vocalizations  were  processed  using 
a  set  of  detection  and  localization  algorithms  developed  as  part 
of  the  Marine  Mammal  Monitoring  on  Navy  Ranges  (M3R) 
program.  Localization  is  performed  using  hyperbolic 
multilateration  on  Time  Difference  of  Arrival  (TDOA)  data 
from  a  two  stage  FFT  based  energy  detector.  Binary  FFTs  are 
computed  from  the  raw  time  series  by  thresholding  the  FFT 
using  a  time  average  in  each  bin  as  the  threshold  criteria.  Clicks 
are  detected  by  comparing  the  total  number  of  bins  above 
threshold  to  a  secondary  threshold.  Detected  clicks  are  split  out 
of  the  data  stream  and  the  rest  of  the  data  is  aligned  using  a 
spectrogram  cross-correlation.  Details  of  the  marine  mammal 
monitoring  algorithms  will  be  presented  as  well  as  results  from 
the  data  set. 

I.  Introduction 

North  Atlantic  right  whales  have  been  observed  to 
generate  multiple  call  types  under  a  variety  of  circumstances. 
Some  of  the  earliest  reports  are  attributed  to  Schevill  and 
Watkins  [  1  ],  who  recorded  North  Atlantic  right  whales 
during  feeding.  Calls  typically  span  a  fundamental  frequency 
range  from  100  to  400  Hz  [2],  although  vocalizations  in 
excess  of  4  kHz  have  been  reported  [3].  Sounds  associated 
with  baleen  rattle  range  up  to  9  kHz,  with  dominant 
frequencies  between  2  and  4  kHz  [4]  while  Blows  and 
Gunshots  have  been  reported  in  excess  of  10  kHz  [3]. 
Southern  right  whales  have  been  reported  to  have  a  similar 
frequency  range  (50-500  Hz)  [5,6]  and  it  has  been  noted  that 
the  difference  is  probably  artificial  [7].  Call  types  have  been 
identified  for  Southern  right  whales  and  have  been  linked  to 
specific  activities  [8].  More  recently,  a  series  of  call  types 
have  been  associated  with  a  surface  active  group  (SAG)  for 
North  Atlantic  right  whales  [3].  Call  types  identified  are 
similar  to  those  for  Southern  right  whales  and  include 
Screams ,  Warbles ,  Blows ,  Upcalls ,  Downcalls ,  and  Gunshots. 
Several  methods  have  been  proposed  to  detect  and  localize 
right  whales  based  on  these  sounds1  including  spectrogram 


1  The  June  2004  issue  of  the  Journal  of  the  Canadian 
Acoustical  Association  is  dedicated  to  detection  and 
localization  of  marine  mammals,  focusing  on  right  whales 


analysis  [9,10],  independent  component  analysis  [11],  model 
based  comparison  [  12  ],  and  neural  networks  [  13  ].  A 
comparison  between  neural  network  and  spectrogram 
analysis  has  shown  neural  networks  to  be  superior  when 
sufficient  training  data  is  available,  although  it  is  noted  that 
spectrogram  methods  may  be  preferable  when  sufficient 
training  data  is  not  available  [13]. 

This  paper  presents  results  from  application  of  a  set  of 
algorithms  for  passive  detection  and  localization  of  marine 
mammals  on  wide  baseline  acoustic  arrays  to  a  data  set  of 
North  Atlantic  Right  Whale  vocalizations  made  available  as 
part  of  the  November  2003  workshop  on  detection  and 
localization  of  marine  mammals  using  passive  acoustics 
[14,15].  The  algorithms  utilize  a  novel  hybrid  detection 
scheme  wherein  broadband  events  (typically  referred  to  as 
clicks)  are  separated  out  of  the  data  stream  and  processed 
separately  from  the  remaining  data.  The  algorithms  have 
been  developed  and  fielded  as  part  of  the  Marine  Mammal 
Monitoring  on  Navy  Ranges  (M3R)  project. 

II.  TECHNIQUE 

The  M3R  toolkit  performs  localization  using  2D  and  3D 
hyperbolic  multilateration  algorithms  described  by  Vincent 
[16].  The  input  parameters  consist  of  TDOA  data  from  a 
separate  data  association  routine  along  with  a  representative 
sound  speed  profile.  The  same  event  must  be  present  on  at 
least  four  hydrophones  to  compute  a  3D  position.  If  fewer 
hydrophones  are  available,  a  2D  position  is  computed.  An 
arbitrarily  shaped  hydrophone  array  may  be  utilized, 
although  co-linearity  of  the  hydrophones  must  be  avoided. 

Detector 

Detection  is  performed  in  multiple  stages.  The  data  from 
each  hydrophone  is  first  run  through  an  N  point  fast  Fourier 
transform  (FFT)  with  variable  overlap.  For  this  data  set,  an 
FFT  size  of  1024  points  and  an  overlap  of  75%  was  chosen. 
Figure  1  shows  a  spectrogram  formed  from  multiple 
cascaded  FFTs  from  one  of  the  conference  dataset  files. 


1-4244-01 15-1/06/$20.00  ©2006  IEEE 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

01  SEP  2006 

2.  REPORT  TYPE 

N/A 

3.  DATES  COVERED 

4.  TITLE  AND  SUBTITLE 

North  Atlantic  Right  Whale  (Eubalaena  glacialis)  Detection  & 

Localization  in  the  Bay  of  Fundy  using  Widely  Spaced,  Bottom  Mounted 
Sensors 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Naval  Undersea  Warfare  Center,  Code  71,  Bldg  1351  1176  Howell  Street 
Newport,  RI  02841  USA 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS (ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

See  also  ADM002006.  Proceedings  of  the  MTS/IEEE  OCEANS  2006  Boston  Conference  and  Exhibition 

Held  in  Boston,  Massachusetts  on  September  15-21,  2006,  The  original  document  contains  color  images. 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

uu 

18.  NUMBER 
OF  PAGES 

6 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


I 

0  1  I 

,bbl„.  i  limiiiiiiniiiiBiiiiT'iiii 

2000  4000  6000  8000  10000  12000  14000  16000 

Time 

Figure  1:  Spectrogram  of  file  SI  3 1-10,  buoy  C 

Each  frequency  bin  /  of  the  FFT  is  compared  to  a  time 
average  of  the  previous  FFT  data  for  that  specific  bin.  If  the 
energy  in  bin  f  exceeds  the  time  average  by  at  least  m  db,  a 
“1”  is  placed  in  a  binary  output  map  in  the  slot  (bit) 
corresponding  to  frequency  bin  f  Otherwise  a  “0”  is  placed 
in  the  corresponding  slot.  The  output  of  the  first  stage,  Qi(f,t ), 
is  therefore  a  binary  valued  frequency  map  derived  from  the 
FFT,  which  contains  a  “1”  in  each  frequency  bin  that 
exceeded  the  time  average  and  a  “0”  everywhere  else. 
Frequency  maps  are  only  produced  when  at  least  one  bin  is 
above  threshold.  The  threshold  m  is  selected  empirically 
based  on  the  data  set  to  be  processed  and  is  not  currently 
normalized  for  either  the  sample  rate  or  the  FFT  size.  For 
this  data  set  a  value  of  m=- 33  was  chosen.  The  detector 
output  is  plotted  as  a  binary  spectrogram  in  Figure  2  below. 
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Figure  2:  Detector  Output  for  spectrogram  shown  in  Figure  1.  Data  shown 
in  red  are  broadband  events  classified  as  clicks  and  removed  from  the  data 
stream. 


A  click  is  detected  by  comparing  the  number  of  bins  set  in 
each  reported  frequency  map  against  a  threshold,  nominally 
10.  Frequency  maps  associated  with  click  detections  are  split 
out  of  the  data  stream  and  sent  to  a  data  association 
algorithm  called  a  “scanning  sieve”  [17].  The  sieve  looks  for 
patterns  of  received  clicks  over  multiple  hydrophones  and 
matches  them,  producing  TDOAs  from  the  offset  between 
the  matched  patterns.  Figure  3  below  illustrates  the  click 


detections  identified  for  the  detector  output  given  in  Figure  2 
and  sent  to  the  scanning  sieve. 


The  remainder  of  the  detector  output  after  the  clicks  have 
been  removed  is  shown  in  Figure  4.  This  data  is  processed 
using  a  technique  based  on  spectrogram  cross-correlation 
among  the  available  hydrophones.  Rather  than  processing 
the  entire  spectrogram,  however,  only  “non-clicks”  are 
processed.  Frequency  maps  associated  with  clicks  are 
dropped  from  the  correlation  and  therefore  are  effectively 
zeroed  out. 
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Figure  4:  Detector  output  passed  to  spectrogram  cross¬ 
correlator 

III.  BAY  OF  FUNDY  TEST  RESULTS 


A  dataset  of  North  Atlantic  Right  Whale  vocalizations  was 
made  available  as  part  of  the  November  2003  workshop  on 
detection  and  localization  of  marine  mammals  using  passive 
acoustics  [14].  Data  collected  in  2002  were  taken  with  a 
sampling  frequency  of  1200  Hz,  with  a  low-pass  filter  of  800 
Hz.  A  filter  roll-off  frequency  above  the  Nyquist  frequency 
was  selected  to  maximize  localization  opportunities  for 
sounds  located  in  the  upper  end  of  the  frequency  range  [15]. 

The  data  set  was  selected  based  on  three  basic  call  patterns: 
a  gunshot ,  a  low  frequency  call,  and  a  mid-frequency  call. 
An  additional  file  containing  multiple  call  types  was 
available  for  testing  detectors. 

To  test  the  capability  of  the  detection  and  localization 
algorithms,  only  modifications  necessary  to  account  for 
differences  in  sample  rates  were  made.  Since  the  data  were 
not  real  time,  but  were  rather  made  available  in  file  form,  a 
Matlab  version  of  the  existing  real  time  tracking  system  was 
used  for  the  analysis.  The  existing  direct  path  tracking 


algorithms  were  used  with  no  additional  provisions  for  a 
shallow  water  multipath  environment. 

The  low  and  mid  frequency  calls  were  both  processed  by 
spectrogram  cross  correlation,  while  gunshots  were 
identified  as  clicks  and  processed  with  the  scanning  sieve. 
Binary  spectrograms  and  correlation  functions  for  several  of 
the  sound  cuts  are  shown  in  Figure  5  through  Figure  9  below: 


Figure  5:  Detector  Output  Event  SI 3 1-10  Buoy  E,  low  and  mid  frequency 
calls.  Clicks  are  identified  in  red  and  removed  prior  to  cross-correlation. 


Figure  6:  S209-14  E  buoy,  mid  frequency  call 


Time  Delay  (seconds) 

Figure  7:  S209-14  correlations  for  master  buoy  L 
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Figure  8:  S070-3  L  buoy  gunshot 
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Figure  9:  S070-3  correlations  for  gunshot 


Figure  10:  SI  10-5  gunshot  spectrogram 
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Figure  11:  SI  10-5  correlations  for  gunshot 


The  detection  and  localization  algorithms  are  well  suited 
to  the  low  and  mid-frequency  calls.  However,  while  the 
scanning  sieve  successfully  processed  the  gunshot  sounds 
provided  in  the  dataset,  it  may  not  be  appropriate  for 
automated  real  time  processing.  The  scanning  sieve  is 
designed  for  repetitively  vocalizing  marine  mammals,  in 
particular  echolocating  odontocetes.  Animals  are  assumed  to 
vocalize  at  a  high  enough  rate  for  a  pattern  to  be  derived. 
That  pattern  is  then  matched  across  multiple  hydrophones  to 
determine  TDOAs.  The  same  effect  may  be  achieved  by 
multiple  co-located  animals  vocalizing  at  a  lower  rate  (per 
animal).  A  single  animal  emitting  a  single  gunshot  will  be 
successfully  matched  among  the  multiple  hydrophones. 
However,  multiple  spatially  separated  animals  emitting 
gunshots  at  low  repetition  rates  will  confuse  the  sieve  and 
lead  to  erroneous  localizations.  Multiple,  spatially  distributed 
groups  vocalizing  at  higher  aggregate  rates  will  similarly 
confuse  the  sieve.  Gunshot  production  rates  have  been 
reported  to  increase  non-linearly  with  group  size  [3,7],  but 
may  be  quite  low  (less  than  1  per  hour  at  the  low  end). 

A  calibration  data  set  was  included  which  consisted  of 
transmissions  of  recorded  right  whale  calls  from  a  RHIB. 
The  calibration  data  set  was  derived  from  a  different 
deployment  of  OBHs  in  September  2000.  Four  OBHs  were 
available.  The  sample  frequency  was  5  kHz,  with  an  anti¬ 
aliasing  filter  at  1  kHz.  Figure  12  shows  a  detection 
spectrogram  from  this  set  of  sound  cuts. 
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Figure  12:  Calibration  data  -  S289-OBH  B 


There  were  a  series  of  broadband  events  in  the  data  set, 
which  were  classified  as  clicks.  These  events  were 
automatically  removed  from  the  data  stream  before  running 
through  the  whistle  detector.  The  remaining  data  included 
several  low  frequency  sweeps.  Figure  13  depicts  the  data 
stream  processed  by  the  whistle  detector  after  clicks  were 
removed. 


Figure  13:  Calibration  data  -  S289-OBH  B  -  whistles  only 


Locations  were  obtained  for  the  sweeps  in  the  data  set 
(Figure  14).  In  general,  the  locations  were  computed  to 
within  200m  of  the  GPS  position  of  the  RHIB.  The  error  was 
initially  thought  to  be  due  to  multipath  that  was  not 
accounted  for.  Anecdotal  accounts  from  the  workshop 
indicate  that  the  data  set  may  have  been  corrupted  by  the 
presence  of  real  vocalizing  animals  in  addition  to  the  RHIB 
data. 


2D  Positions  from  M3R  Algorithms  for  Calibration  Data 


Figure  14:  2D  Positions  from  Calibration  dataset.  Inset  shows  close-up  view 
of  posits  with  RHIB.  Outliers  are  2D,  three  hydrophone  localizations 


While  the  signals  of  interest  were  processed  using  the 
whistle  detector,  the  calibration  data  set  showed  evidence  of 
broadband  events  in  the  data,  which  were  classified  as  clicks. 
These  events  are  shown  in  Figure  15  below: 
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Figure  15:  Calibration  data  -  S289-OBH  B  -  clicks  only 


These  clicks  were  present  on  all  OBHs.  The  click  detector 
in  the  M3R  tool  set  localized  the  events  as  shown  in  Figure 
16  below. 


2D  Click  Positions  from  M3R  Algorithms  for  Calibration  Data 


IV.  SUMMARY 


repetitively  vocalizing  marine  mammals  with  sufficient 
source  levels  to  be  detected  on  multiple  hydrophones. 
Application  of  the  algorithms  to  the  North  Atlantic  Right 
Whale  ( Eubalaena  glacialis)  calls  contained  in  the  dataset 
indicates  that  they  are  well  suited  to  the  low  and  mid- 
frequency  calls,  but  that  application  of  the  scanning  sieve  to 
gunshot  sounds  may  be  problematic. 
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